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system  conceptualization  known  as  a  semantic  net  which  will 
net  be  explored  here. 
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the  hierarchy  is  linearized.  This  makes  multi- structuring  very 
difficult.  Name  resolution  also  becomes  difficult. 

The  notion  of  a  ’data  base1  became  more  popular 
especially  in  the  context  of  large,  shared  data  bases.  It  became 
clear  that  user  programs  do  not  create  data  or  data  structure, 
but  instead  select  from,  and  insert  into,  predefined  collections 
of  lata  items.  These  collections  possess  their  own  internal 
structure  an!  should  no*  (cannot)  be  further  structured  for  all 
the  unknown  potential  uses  to  which  the  data  base  can  be  put. 
Furthermore,  the  essential  meaning  of  the  data  resides  in  this 
unalterable  structure.  This  led  to  the  notion  of  data  model. 

COD^SYL,  first  in  1969,  and  then  more  formally  in  the 
A.pril  1971  DP TG  report  [4]  offered  their  approach  to  the  notion 
of  data  base  modelling.  Codd  working  independently  offered  an 
alternative  to  the  DBTG  idea,  the  relational  model  [ 1  ]. 

Codd's  objection  to  the  DBTG  report  (and  that  of  others 
as  well)  was  that  DBTG  had  allowed  too  many  aspects  of  data  base 
description  to  be  combined  into  one  presentation.  Codd  gave  an 
example  of  how  the  very  same  data  model  could  be  expressed  in 
many  different  hierarchical  models  resulting  in  guite  different 
logical  access  to  the  same  data  items.  Furthermore,  he  argued 
that  this  logical  access  often  unduly  influenced  physical  access. 
On  a  different  level,  it  seemed  that  the  problem  of  linearization 
was  not  eliminated  in  the  DBTG  model  as  a  result  of  the  influence 
of  'currency*  indicators. 

In  a  long  series  of  papers  (which  also  generated  many 
papers  from  other  authors) ,  Codd  argues  that  an  alternative 
presentation,  the  n-ary  relational  model,  was  more  satisfying  as 
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regards  data  base  modelling  (e.g.,  [1],  [2],  [3]).  He  discussed 
the  fundamentals  of  the  idea  of  relations,  normal  forms, 
operators,  and  relational  language. 

A  relation  is  the  formal  expression  of  an  association 
between  the  attributes  participating  in  the  relation.  One  could 
perhaps  write  out  this  formal  expression  (in  fact  one  must  when 
implementing  relations)  in  a  formal  language,  e.g.,  Algol  60  or 
first-order  predicate  calculus.  (As  a  side  effect  of  using  the 
latter,  we  get  that  the  attributes  of  a  relation  must  be 
’simple*,  i.e.,  must  not  be  relations.  This  gives  an  independent 
explanation  for  the  introduction  by  Codd  of  first  normal  form.) 
This  formal  expression  is  one  of  many  equivalent  presentations  of 
the  concept  ’relation'.  Another  is-  as  a  table,  whose  rows 
represent  instances  of  the  relation.  Though  this  latter  object  is 
a  valid  presentation  of  a  relation,  it  can  be  misleading  in  much 
the  same  way  that  a  snapshot  of  an  executing  program  can  be 
misleading.  It  is  surmised  that  it  was  to  this  latter  type  of 
presentation  that  Todd  was  referring  when  he  used  the  phrase 
*  time -va rying  relation*,  since  relations  are  not  *  time- varying* , 
only  certain  of  their  presentations  are. 

On  first  glance  there  seems  to  be  no  reason  why  a 
relation  cannot  be  a  quite  complex  association  of  a  large  number 
of  attributes. 

Such  a  large  complex  association  can  be  as  unpleasant  an 
object  to  deal  with  as  a  DBTG  network,  however.  This  is  due 
principally  to  the  potential  for  the  existence  of  many  intra- 
relational  associations  among  the  attributes.  The  intra- 
relational  associations 


manifest  themselves  in  many  ways.  For 
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example,  if  the  relation  is  presented  as  a  table,  then  the  table 
might  possess  considerable  redundancy.  If  the  relation  is 
presented  f unctiorally ,  one  might  notice  many  common  internal 
function  calls. 
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associating  them  through  some  common  attributes.  (Such  a  desire 
is  in  fact  often  the  reason  for  the  initial  construction  of  large 
complex  associations,  which  then  leads  to  the  first  problem.)  Now 
such  associations  can  be  quite  misleading  if  they  are  performed 
naively;  in  fact,  information  may  be  created  by  such 
associations.  (See  Todd’s  remarks  on  the  ’connection  trap'  [1].) 

It  thus  turns  out  that  large  relations  may  be  desirable 
in  order  to  extract  all  the  information  from  the  known  attribute 
associations,  but  such  large  associations  may  in  turn  generate 
problems.  What  guides  the  way  in  which  attributes  may  be 
meaningfully  associated? 

We  remark  now  that  the  problems  encountered  are  not 
unique  to  relations,  though  they  are  most  clearly  seen  when  using 
relations.  They  are  important  whenever  data  attributes  are 
grouped  together  (e.g.,  in  an  owner  coupled  set  of  the  DBTG 
network  model)  .  The  problems  stem  from  the  fact  that  certain 
attributes  completely  determine  other  attributes.  For  example, 
employee  number  might  determine  name,  address,  manager  name,  etc. 
Manager  name  determines,  say,  job  description,  manager  address, 
etc.  Such  relationships  among  attributes,  where  one  attribute 
fully  determines  other  attributes,  are  called  functional 
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dependencies  (FPs)  .  (Following  Codd  [2],  we  write  A->B  for  "A 
functionally  determines  B"„)  Certain  kinds  of  FDs  among  a  set  of 
attributes  may  make  it  undesirable  to  group  that  set  together. 
For  example,  assume  employee  number,  manager  name,  and  manager 
address  are  grouped  together  (in  a  single  relation,  say).  This 
grouping  has  several  undesirable  properties.  First,  an 
association  between  manager  and  a  manager  address  is  repeated  in 
the  relation  for  every  employee  in  the  relation,  creating 
unwanted  redundancy.  This  redundancy  in  turn  creates  other 
difficulties.  If  an  employee  is  deleted  from  the  relation,  and  he 
happens  to  be  the  last  employee  of  a  manager,  then  the  manager 
information  also  disappears.  This  is  called  a  de let ion  anomaly. 
Similarly,  if  a  person  is  inserted  who  happens  to  be  the  first 
employee  of  a  manager,  this  also  introduces  a  new  manager  into 
the  relation.  This  is  called  an  insertion  anomaly .  Such  anomalies 
are  undesirable,  since  the  user  is  not  likely  to  realize  the 
consequences  of  his  insertion  or  deletion.  Consistency  of  the 
relation  is  also  jeopardized  via  updates.  If  an  association 
between  a  manager  name  and  manager  address  is  changed  for  one 
employee.  Then  to  maintain  consistency  this  association  should 
be  changed  for  all  employees  of  this  manager.  Again,  the  user  (or 
system)  may  not  realize  this,  since  not  all  updates  have  this 
effect . 


What  is  really  desired,  then,  is  a  grouping  of 
attributes  that  does  not  exhibit  these  difficulties.  Codd's 
original  description  of  normalization  was  a  process  of 
decomposition  of  a  relation.  We  propose  that  a  constructive 
approach  is  more  reasonable  for  obtaining  groupings  of 


more 
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attributes.  The  structure  of  such  a  grouping  is  completely 
determined  by  the  FDs  that  exist  among  subsets  of  attributes. 
Since  the  FDs  must  be  given  a  priori,  it  makes  sense  to  use  them 
in  the  construction  of  normalized  groupings.  We  will  show  that 
the  construction  can  be  done  algorithmically.  This  has  two 
obvious  advantages:  the  process  can  be  automated  and  the  result 
is  guaranteed  to  have  the  desired  properties.  Using  the 
constructive  approach,  it  is  also  convenient  to  add  new 
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concept,  we  assume  that  each  data  base  is  described  to  the  system 
as  a  set  of  FDs  over  a  set  of  attributes.  The  system  must  now 
find  a  method  of  storing  data  instances  associated  with  these 
FDs.  One  possible  method  is  to  store  these  FDs  as  relations.  To 
do  this  we  must  find  a  way  of  generating  a  relational  schema  that 
is  a  complete  representation  of  the  given  FDs.  The  remainder  of 
this  paper  will  be  directed  toward  finding  algorithmic  methods  of 
generating  such  a  schema. 

One  simple  way  of  getting  a  relational  schema  from  a  set 


of  FDs  is  to  group  together  attributes  that  are  all  functionally 
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dependent  on  the  same  small  set  of  attributes.  This  suggests  the 
following  algorithm: 

A1  gor ithm  A 

1.  Partition  the  set  of  FDs  into  groups  such  that  all  the 
FDs  in  each  group  have  exactly  the  same  left  sides. 

2.  For  each  group  construct  a  relation  consisting  of  all 

4 

the  attributes  appearing  in  that  group.  The  left  siie  of 
the  FDs  in  the  group  will  be  a  key  for  the  relation. 

There  are  many  problems  with  Algorithm  A.  First,  the  set 
of  FDs  may  be  highly  redundant.  That  is,  many  FDs  are  derivable 
from  other  FDs  using  a  transitivity  rule  (e.g.,  if  A->B  an3  B->C 
are  given  then  A->C  is  redundant) .  Many  of  these  extra  FDs  will 
generate  unwanted  redundancy  in  the  induced  relational  schema. 
Second,  the  resulting  relations  may  not  be  in  third  normal  form, 
thus  allowing  insertion/deletion  anomalies  and  consistency 
problems  to  arise.  A  third  problem  is  that  of  isolating  all  of 
the  keys  in  the  constructed  relations.  Some  of  the  attributes  on 
the  right  side  of  the  FDs  may  also  functionally  determine  the 
left  side  attributes,  thus  establishing  other  attributes  as  keys. 
In  addition,  there  is  no  measure  on  how  efficient  this  schema  is. 
For  example,  can  some  of  the  relations  be  further  agglomerated  to 
form  a  smaller  schema?  We  continue  by  discussing  each  of  these 
problems  in  more  detail. 

3.  5 und an 

Some  of  the  FDs  that  are  used  to  describe  the  data  base 
may  be  redundant  in  the  sense  that  they  are  derivable  from  the 
others.  If  we  use  Method  A,  then  redundant  FDs  will  induce 
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FD 


1.  A  B->C 

2.  CD-  >  E 

3.  ABD->E 


Figure  1 

In duced  Relation 
via  Method  A 
P(ArB,C) 

S (C  ,  D  ,  E) 

T  (A,B,D,E) 


The  relation  T  can  be  obtained  by  joining  R*S  on  attribute 
C.  By  observing  FD  #3  is  redundant,  we  avoid  producing  the 
redundant  relation  T. 
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3.1  F±Hiing.  Nonre dundan t  FD  Sets 

Delohel  [5]  presents  the  following  rule  for  deriving  FDs 
from  other  FDs. 

Rule-T  (Pelobel) :  Let  A ,  B,  C,  and  D  each  be  a  set  of 

attributes,  and  suppose  that  A->B  and  C->D  are 
FDs.  If  BSc,  then  AL/(C-B)  ->D  is  also  an  FD. 

Given  a  set,  X,  of  FDs  then  the  T-closure  of  X  (denoted  X+ )  is 
the  set  of  all  FDs  that  can  be  obtained  by  repeated  application 
of  ?ule-T.  We  will  always  assume  X  is  finite;  hence,  the  T- 
closure  of  X  is  also  finite. 

Compositions  of  FDs  according  to  Rule-T  may  not  have  the 
intended  semantic  meaning.  For  example,  let  E#  be  employee 
number,  D#  be  department  number,  and  LOC  be  location.  Say, 
E#->LOC,  F#->D*,  and  D#->LOC.  By  composing  the  second  two  FDs 
under  Rule-T,  we  obtain  the  first  FD.  But  one  might  discover  that 
LOC  for  the  first  FD  means  the  employee's  home  address,  while  LOC 
for  the  third  FD  means  the  department's  address.  On  one  level 
this  is  just  a  naming  problem;  *:he  single  word  LOC  was  used  in 
two  distinct  ways.  However,  there  exist  more  complex  examples 
where  name  resolution  is  not  sufficient  to  solve  the  semantic 
difficulty.  Such  semantic  problems  are  beyond  the  scope  of  this 
paper.  Hence,  we  will  assume  that  there  exists  a  semantic 
procedure  which  can  distinguish  between  semantically  valid  and 
invalid  applications  of  Rule-T.  In  a  succeeding  paper,  we  hope  to 
describe  how  such  a  routine  might  be  constructed. 

A  set  of  attributes  is  simple  if  it  has  cardinality  one. 
Otherwise,  it  is  composite.  If  A,  B,  C,  and  D  are  simple,  then 
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Rule  T  is  exactly  the  same  as  transitivity  in  a  directed  graph, 
where  the  attributes  are  nodes  and  the  FDs  are  edges.  An  FD  is 
simple  if  its  left  and  right  sides  are  both  simple.  If  X  is  a  set 
of  simple  FDs,  then  the  T-closure  of  X  is  exactly  the  transitive 
closure  of  X  when  X  is  viewed  as  a  directed  graph. 

Now,  suppose  a  set  of  FDs,  X,  is  supplied  from  which  a 
relational  schema  is  to  be  formed.  Then,  what  is  needed  is  a 
nonredundant  subset  of  X  whose  T-closure  is  the  same  as  the  T- 
closure  of  X.  Call  this  the  nonredundant  covering  problem .  Even 
better,  we  would  like  a  subset  of  X  which  is  a  nonredundant 
covering  and  whose  cardinality  is  minimal  among  all  such 
coverings.  This  set  is  called  a  minimal  covering  of  X. 

The  minimal  covering  problem  is  computationally  quite 
difficult.  Consider  the  minimal  covering  problem  on  a  set  of 
simple  FDs.  This  is  equivalent  to  finding  a  subgraph  G'  of  a 
directed  graph  G  such  that  the  transitive  closure  of  G'  is  the 
same  as  the  transitive  closure  of  G,  and  the  number  of  edges  in 
G'  is  minimal  among  all  subgraphs  of  G  with  that  transitive 
closure  property.  This  problem,  called  the  ’’minimal  equivalent 
graph"  problem,  was  proven  to  be  NP-complete  by  Sahni[6].  Delobel 
[5]  has  given  an  algorithm  for  finding  a  minimal  covering  on  a 
set  of  composite  FDs.  Analysis  of  this  algorithm  shows  some 
simple  cases  which  run  at  speeds  of  order  K2.'  ,  where  K  is  the 
number  of  attributes.  Obviously,  Delobel’s  algorithm  is 
unfeasible  for  even  relatively  small  values  of  K. 

The  minimal  covering  problem  can  be  relaxed  somewhat  by 
looking  for  a  nonredundant  covering  instead  of  a  minimal  one. 
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There  is  an  obvious  algorithm  to  find  a  nonredundant  covering  of 
a  set,  X,  of  FDs. 

A.lgor ithm  B 
Do  for  each  FD  f  in  X; 

Y  :=  X  -  f; 

If  f€Y+  then  X  :=  Y; 

end; 

For  simple  FDs,  the  predicate  f<Y+  can  be  calculated  by 
finding  the  transitive  closure  of  the  associated  direct  graph. 
Transitive  closure  is  0(k3),  so  the  whole  algorithm  is  0(nk3), 
where  n  is  the  number  of  FDs  and  k  the  number  of  attributes.  The 
problem  with  generalizing  this  method  for  composite  FDs  is  that 
the  T-closure  is  larger  and  more  difficult  to  calculate  than  the 
transitive  closure  of  a  directed  graph.  However,  the  predicate 
f€Y+  does  not  require  that  the  closure  actually  be  constructed. 
Thus,  ^he  problem  reduces  to  solving  the  membership  problem  for 
the  T-closure  of  a  set  of  FDs. 

3.2  A.n  Algorithm  to  Solve  the  Membership  Problem 

To  discuss  our  algorithm  to  compute  the  predicate  f*  Y+ , 
we  present  an  isomorphic  formulation  of  the  problem.  Instead  of 
thinking  of  FDs  of  the  form  X->Y,  we  will  invert  and  consider 
Eiolucf of  the  form  Y->X.  The  combination  rule  T  is  now 
rewritten  as: 

Pule-T*:  Let  D->C  and  B->A  be  two  productions  with  BSC.  Then 

D-> (C-B)U  A  is  also  a  production. 
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Fule-T*  looks  very  much  like  a  grammatical  substitution  rule, 
with  the  exception  that  we  are  forming  sets,  not  strings.  That 
is,  if  D->C  and  B->fc  are  productions,  then  substitute  A  for  the 
occurrence  of  B  in  C.  For  the  sake  of  exposition,  we  restrict  FDs 
to  always  having  the  form  X->Y  or  XY->Z,  where  X,  Y,  and  Z  are 
simple.  (We  will  show  how  to  generalize  the  results  later.)  This 
restriction  implies  D  and  B  in  Pule~T*  are  always  singleton 
symbols.  That  is,  the  rule  is  always  a  substitution  rule  for  the 
single  symbol  (i.e.,  attribute)  B  appearing  in  D.  Note  that  this 
is  identical  to  our  original  formulation,  except  we  are  reversing 
the  direction  of  the  arrows. 

Assume  we  are  trying  to  derive  Z->XY  by  combining  a 
group  of  other  productions  via  Hule-T*.  Then  we  call  Z  the  start 
symbol  and  XY  the  goal  symbols.  If  we  are  working  on  a  derivation 
where  Z-->A.-->XY  (where  -->  means  Fule-T*  was  applied  some  number 
of  times,  and  A  is  possibly  a  composite  attribute) ,  then  we  call 
A  an  intermediate  set.  Now,  in  each  step  of  a  derivation  exactly 
one  symbol  is  replaced  by  one  or  two  other  symbols,  thus  creating 
a  new  intermediate  se+.  If  the  intermediate  set  ever  eguals  the 
goal  symbols  XY,  then  the  derivation  is  complete.  This 
replacement  process  can  be  mirrored  in  the  construction  of  a 
binary  derivation  tree.  The  root  of  the  tree  is  the  start  symbol. 
If  a  leaf  of  the  tree  is  replaced  (in  the  derivation)  by  one  or 
two  other  symbols,  then  those  symbols  become  children  of  the 
"replaced"  symbol.  The  union  of  all  symbols  that  appear  as  leaves 
of  the  tree  is  the  current  intermediate  set  for  the  derivation.  A 
sample  derivation  with  corresponding  tree  construction  is  given 

o 

in  figure  2. 


Figure  2  A  Sample  Derivation 
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Finding  a  derivation  for  a  production  Z->XY  from  a  set 
of  given  productions  is  equivalent  to  finding  a  derivation  tree 
with  root  Z  all  of  whose  leaves  are  labeled  X  or  Y.  (At  least  one 
is  labeled  X  and  at  least  one  is  labeled  Y.)  Define  a 
£3M  .^Xn^-DT  to  be  a  derivation  tree  (DT)  all  of  whose  leaves 
are  elements  of  the  set  {XI,...,  Xn}  .  oiir  algorithm  for  finding  a 
derivation  tree  for  Z->XY  works  by  building  up  {XY}-DTs. 

Assume  w->X  is  a  production.  Then  W  can  root  an  {X} -DT. 
Now,  if  U->wr  tt  can  also  root  an  {X}-DT  by  F.ule-T*.  Suppose  we 
want  to  construct  a  list,  call  it  LISTX,  of  all  symbols  that  can 
root  an  {X}-DT.  Begin  by  putting  X  on  LISTX.  For  each  symbol  A  on 
LISTX,  add  all  symbols  B  such  that  B->A  is  a  production.  Continue 
to  apply  this  rule  until  no  new  symbols  can  be  added.  The 
resulting  LISTX  has  the  desired  property.  We  can  construct  a 
list,  say  LISTY,  of  all  symbols  that  can  root  {Y} -DTs  by  applying 
the  same  procedure  to  a  list  initially  containing  Y.  With  LISTX 
and  LISTY,  we  can  now  find  all  possible  roots  for  {XY} -DTs  by 
using  the  following  observation.  A  symbol  ,  Q,  can  root  an  {XY}  - 
DT  iff: 

(i)  Q->X Y  is  a  given  production,  or 

(ii)  Q->WY  is  a  given  production  and  W  is  on  LISTX,  or 

(iii)  Q->XW  is  a  given  production  and  W  is  on  LISTY,  or 

(iv)  Q->VW  is  a  given  production  and  V  is  on  LISTX  and  W  is 
on  LISTY. 

We  can  construct  LISTXY,  the  set  of  all  roots  of  {XY}-DTs,  by 
successively  adding  each  Q  that  satisfies  one  of  (i)  through  (iv) 
until  no  new  Q  can  be  added  to  the  list.  Z-->XY  iff  Z  is  on 


LISTXY. 
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Space  does  not  permit  us  to  go  into  the  details  of  how 
to  build  up  these  lists  efficiently.  However,  it  can  be  shown 
that  the  above  algorithm  for  finding  the  derivations  of  an  FD  of 
the  form  XY->Z  runs  in  time  0  (k2)  ,  where  k  is  the  total  number  of 
attributes  in  all  of  the  FDs.  If  the  left  sides  of  the  FDs  are 
permitted  to  be  as  big  as  m  (i.e.,  X1,X2,...,Xm  ->  Z) ,  then  the 
algorithm  can  be  generalized  so  that  for  each  subset  X  of 
{X1f. .. ,Xm}  it  builds  up  a  list  of  all  attributes  which  can  root 
an  X-PT.  In  this  case,  the  algorithm  runs  in  time  3(km).  However, 
we  expect  real  data  bases  will  rarely  be  sufficiently  complex  to 
force  the  algorithm  close  to  this  worst  case  speed. 

4  .  Finding,  Relations  (Fev isit ed) 

The  previously  described  algorithm  for  testing  the 
predicate  f€Y+  can  be  used  to  implement  Algorithm  B  for  finding  a 
nonredundant  covering.  The  running  time  of  Algorithm  B  is  0(nkm), 
for  k  attributes  and  n  FDs  whose  left  sides  have  at  most  m 
attributes.  Applying  Method  A  to  this  nonredundant  covering 
yields  a  set  of  relations  which  "covers”  the  given  set  of  FDs. 
Two  problems  regarding  these  relations  remain  to  be  solved: 
finding  all  the  candidate  keys  of  each  relation  and  putting  the 
relations  into  third  normal  form. 

In  [2]  Codd  shows  how  to  decompose  relations  into  third 
normal  form  by  eliminating  all  partial  dependencies  and 
transitive  dependencies  of  nonprime  attributes  on  any  candidate 
key.  A  nonprime  attribute  is  one  that  does  not  participate  in  any 
candidate  key.  An  attribute  X  is  partially  dependent  on  a  group 
of  attributes,  Y,  if  for  some  subset  Z  of  Y,  Z->X.  An  attribute  X 
is  transitively  dependent  on  a  group  of  attributes,  Y,  if  there 
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is  an, attribute  Z  such  that  Y->Z,  Z/>Y,  and  Z->X.  Partial  and 
transitive  dependencies  have  been  shown  to  induce 
insertion/deletion  anomalies  and  consistency  problems,  and  are 
therefore  undesirable  properties  for  a  user's  relational  schema 
[2]. 

Let  X->YZ  be  an  FD  in  the  nonredundant  covering.  By 
Algorithm  A.,  this  FP  becomes  a  relation  F.  (XYZ)  ,  where  X  is  a  key. 
We  claim  that  there  cannot  be  any  transitive  dependencies  in  F. 
Assume,  for  instance,  that  Z  is  transitively  dependent  on  X 
through  Y.  That  is,  X->Y,  Y/>X,  and  Y->Z.  But  by  Rule-T,  if  X->Y 
and  Y - >Z  are  both  in  the  closure  of  the  given  set  of  FDs,  then 
X->Z  must  be  redundant.  This  contradicts  the  fact  that  X->YZ  was 
in  a  non redundan4-  covering.  Hence,  the  transitive  dependency 
cannot  exist. 

To  find  all  the  keys  of  the  relation,  we  must  find,  those 
subsets  of  attributes  in  the  relation  which  functionally 
determine  all  other  attributes  of  the  relation.  Partial 
dependencies  can  be  detected  by  using  the  following  key  finding 
algorithm.  Consider  a  relation  F.  (XI,  ...,  Xn)  .  Let  X={X1,  ..., 
Xn}.  For  each  subset,  Y,  of  X  construct  LISTY,  i.e.,  the  set  of 
all  attributes  that  can  root  a  (Y}-DT.  Y  is  a  key  of  R  iff  all  of 
the  elements  of  the  set  X  -  Y  appear  on  LISTY.  If  some,  but  not 
all  of  the  elements  of  X  -  Y  appear  on  LISTY,  and  if  there  is  a 
key  X  which  properly  contains  Y,  then  those  elements  on  LISTY  are 
partially  dependent  on  the  key  Z.  Note  that  if  we  find  a  subset, 
Y,  of  X  which  is  a  key,  then  we  need  not  construct  LISTZ  for  any 
Z  that  contains  Y.  Constructing  LISTY  for  all  Y  that  do  not 
contain  any  other  keys  will  yield  all  partial  dependencies  as 
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well  as  all  keys  of  relation  F.  We  can  then  use  Codd's  method  of 
decomposing  F  to  eliminate  all  of  the  partial  dependencies.  The 
resulting  relations  are  guaranteed  to  be  in  third  normal  form. 

The  third  normal  form  schema  obtained  in  this  way  still 
contains  too  many  relations.  Let  E  and  S  be  two  relations  in  the 
schema  with  X  a  key  of  F  and  Y  a  key  of  S.  It  is  possible  that 
X  ->Y  and  Y->X ,  in  which  case  X  and  Y  are  in  some  sense  equivalent 
keys.  If  therefore  makes  sense  to  join  F  and  S  into  a  single 
relation.  If  F.  and  S  have  any  attributes  in  common,  then  this 
join  will  reduce  redundancy.  The  join  also  reduces  the  number  of 
relation  names  by  one. 

It  is  easy  to  check  whether  or  not  X->Y  and  Y->X  are  in 
the  closure  of  the  given  set  of  FDs  by  applying  the  algorithm 
described  in  the  previous  section.  If  X  and  Y  are  equivalent 
keys,  and  the  two  relations  are  joined  together,  then  new  partial 
dependencies  may  result.  These  dependencies  can  be  eliminated  as 
described  above. 

Let  U-> V  be  a  FD  in  the  closure  of  the  given  set  of  FDs 
and  assume  U  and  V  do  not  both  appear  in  any  one  relation.  Since 
the  constructed  schema  covers  the  set  of  given  FDs,  there  must  be 
a  series  of  joins  among  relations  in  the  schema  which  results  in 
a  relation  in  which  U  and  V  are  both  present  and  functionally 
related.  It  can  be  shown  that  for  any  such  D  and  Vf  if  a  join 
were  constructed  so  that  U  and  V  appeared  in  one  relation,  then 
that  relation  must  exhibit  partial  or  transitive  dependencies. 
Since  these  dependencies  create  insertion/deletion  anomalies  as 
well  as  consistency  problems,  we  judge  that  the  constructed 
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schema  is  minimal  in  the  sense  that  joining  any  two  relations 
will  destroy  the  schema’s  normalized  characteristics. 

5 .  Conclusion 

We  wish  to  point  our  here  that  the  idea  of  functional 
dependency  is  really  just  the  formalization  of  the  classical  view 
of  data  mentioned  at  the  beginning.  Data  families  with 
predetermined  functional  associations  are  generalizations  of  the 
idea  that  data  families  receive  their  meaning  through  program 
elaboration.  What  the  idea  of  functional  dependencies  does  is  to 
recognize  that  it  is  the  triple 

{domain,  range,  function} 

which  is  the  basic  unit  of  design,  and  not  just  the  domain  and 
range.  Furthermore,  this  function  may  be  expressed  in  many  ways: 
a  functional  relation,  a  time-varying  table,  or  a  hierarchy.  What 
is  important  is  only  that  the  set  of  attributes  forming  the 
domain,  and  those  forming  the  range  are  connected  functionally  by 
a  named  function. 
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of  the  OMEGA,  project  here  at  University  of  Toronto,  are  tied  to 
implementation.  Such  optimizations  depend  upon  how  the  data  base 
is  used.  Usage  character istics  are  variable,  while  normalization 
attends  to  problems  which  are  independent  of  usage,  that  is, 
logical  problems.  Second,  we  did  not  discuss  the  semantics  of  the 
FDs  that  describe  the  data  base.  Certain  connections  among 
attributes  resulting  from  the  composition  of  FDs  may  not  be 
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